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In the sequential change-point detection literature, most research 
specifies a required frequency of false alarms at a given pre-change 
distribution fg and tries to minimize the detection delay for every 
possible post-change distribution g\. In this paper, motivated by a 
number of practical examples, we first consider the reverse question 
by specifying a required detection delay at a given post-change dis- 
tribution and trying to minimize the frequency of false alarms for 
every possible pre-change distribution fg. We present asymptotically 
optimal procedures for one-parameter exponential families. Next, we 
develop a general theory for change-point problems when both the 
pre-change distribution fg and the post-change distribution g\ in- 
volve unknown parameters. We also apply our approach to the special 
case of detecting shifts in the mean of independent normal observa- 
tions. 

1. Introduction. Suppose there is a process that produces a sequence 

of independent observations X\ , X2 , Initially the process is "in control" 

and the true distribution of the X's is fe for some 9 £ Q. At some unknown 
time the process goes "out of control" in the sense that the distribution of 
X u , X u+ i, . . . is g\ for some A € A. It is desirable to raise an alarm as soon as 
the process is out of control so that we can take appropriate action. This is 
known as a change-point problem, or quickest change detection problem. By 
analogy with hypothesis testing terminology [12], we will refer to G (A) as 
a "simple" pre-change (post-change) hypothesis if it contains a single point 
and as a "composite" pre-change (post-change) hypothesis if it contains 
more than one point. 
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The change-point problem originally arose from statistical quality control, 
and now it has many other important applications, including reliability, 
fault detection, finance, signal detection, surveillance and security systems. 
Extensive research has been done in this field during the last few decades. 
For recent reviews, we refer readers to [1, 9] and the references therein. 

In the simplest case where both and A are simple, that is, the pre-change 
distribution fg and the post-change distribution g\ are completely specified, 
the problem is well understood and has been solved under a variety of cri- 
teria. Some popular schemes are Shewhart's control charts, moving average 
control charts, Page's CUSUM procedure and the Shiryayev-Roberts pro- 
cedure; see [1, 17, 24, 25, 26]. The first asymptotic theory, using a minimax 
approach, was provided in [14]. 

In practice, the assumption of known pre-change distribution fg and post- 
change distribution g\ is too restrictive. Motivated by applications in sta- 
tistical quality control, the standard formulation of a more flexible model 
assumes that O is simple and A is composite, that is, fg is completely spec- 
ified and the post-change distribution g\ involves an unknown parameter 
A. See, for example, [9, 10, 11, 14, 20, 21, 29]. When the true 9 of the pre- 
change distribution fg is unknown, it is typical to assume that a training 
sample is available so that one can use the method of "point estimation" to 
obtain a value 6q. However, it is well known that the performances of such 
procedures are very sensitive to the error in estimating 0; see, for example, 
[30]. Thus we need to study change-point problems for composite pre-change 
hypotheses, which allow a range of "acceptable" values of 6. 

There are a few papers in the literature that use a parametric approach 
to deal with the case when the pre-change distribution involves unknown 
parameters (see, e.g., [6, 8, 22, 33, 34]), but all assume the availability of a 
training sample and/or the existence of an invariant structure. In this paper, 
we make no such assumptions. Our approach is motivated by the following 
examples. 

Example 1.1 (Water quality). Suppose we are interested in monitor- 
ing a contaminant, say antimony, in drinking water. Because of its potential 
health effects, the U.S. Environmental Protection Agency (EPA) sets a max- 
imum contaminant level goal (MCLG) and a maximum contaminant level 
(MCL). An MCLG is a nonenforceable but desirable health-related goal es- 
tablished at the level where there is no known or expected risk to health. 
An MCL is the enforceable limit set as close to the MCLG as possible. For 
antimony, both MCL and MCLG are 0.006 mg/L. Thus the water quality is 
"in control" as long as the level of the contaminant is less than MCLG, and 
we should take prompt action if the level exceeds MCL. 
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Example 1.2 (Public health surveillance). Consider the surveillance of 
the incidence of rare health events. If the underlying disease rate is greater 
than some specified level, we want to detect it quickly so as to enable early 
intervention from a public health point of view and to avoid a much greater 
tragedy. Otherwise, the disease is "in control." 

Example 1.3 (Change in variability). In statistical process control, 
sometimes one is concerned about possible changes in the variance. When 
the value of the variance is greater than some pre-specified constant, the 
process should be stopped and declared "out of control." However, when 
the process is in control, there typically is no unique target value for the 
variance, which should be as small as the process permits. 

Example 1.4 (Signal disappearance). Suppose that one is monitoring 
or tracking a weak signal in a noisy environment. If the signal disappears, 
one wants to detect the disappearance as quickly as possible. Parameters 
9 associated with the signal, for example, its strength, are described by 
a composite hypothesis before it disappears, but by a simple hypothesis 
(strength equal to zero) afterward. 

The essential feature of these examples is that the need to take action in 
response to a change in a parameter 9 can be defined by a fixed threshold 
value. This inspires us to study change-point problems where O is composite 
and A is simple. Unlike the standard formulation which specifies a required 
frequency of false alarms, our formulation specifies a required detection de- 
lay and seeks to minimize the frequency of false alarms for all possible pre- 
change distributions fg. Section 2 uses this formulation to study the problem 
of detecting a change of the parameter value in a one-parameter exponential 
family. It is worthwhile pointing out that the generalized likelihood ratio 
method does not provide asymptotically optimal procedures under our for- 
mulation. 

It is natural to combine the standard formulation with our formulation 
by considering change-point problems when both and A are composite, 
that is, both the pre-change distribution and the post-change distribution 
involve unknown parameters. Ideally we want to optimize all possible false 
alarm rates and all possible detection delays. Unfortunately this cannot be 
done, and there is no attractive definition of optimality in the literature for 
this problem. In Section 3, we propose a useful definition of "asymptotically 
optimal to first order" procedures, thereby generalizing Lorden's asymptotic 
theory, and develop such procedures with the idea of "optimizer." 

This paper is organized as follows. In the remainder of this section we 
provide some notation and definitions based on the classical results for the 
change-point problem when both and A are simple. Section 2 establishes 
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the asymptotic optimality of our proposed procedures for the problem of 
detecting a change of the parameter value in a one-parameter exponential 
family, and Section 3 develops an asymptotic theory for change-point prob- 
lems when both the pre-change distribution and the post-change distribution 
involve unknown parameters. Both Sections 2 and 3 contain some numerical 
simulations. Section 4 illustrates the application of our general theory to the 
problem of detecting shifts in the mean of independent normal observations. 
Section 5 contains the proof of Theorem 2.1. 

Denote by P^,E^ the probability measure and expectation, respec- 
tively, when X%, . . . ,X v -\ are distributed according to a pre-change distri- 
bution fg for some 6 £ and X u ,X u+ i, . . . are distributed according to a 
post-change distribution g\ for some A G A. We shall also use Pg and Eg to 
denote the probability measure and expectation, respectively, under which 
X\ , X2 , . . . are independent and identically distributed with density fg (cor- 
responding to v = 00). In change-point problems, a procedure for detecting 
that a change has occurred is defined as a stopping time N with respect 
to {X n } n >±. The interpretation of N is that, when N = n, we stop at n 
and declare that a change has occurred somewhere in the first n observa- 
tions. The performance of N is evaluated by two criteria: the long and short 
average run lengths (ARL). The long ARL is defined by EgN. Imagining re- 
peated applications of such procedures, practitioners refer to the frequency 
of false alarms as 1/EgN and the mean time between false alarms as EgN. 
The short ARL can be defined by the following worst case detection delay, 
proposed by Lorden [14]: 



Note that the definition of E^A?" does not depend upon the pre-change 
distribution fg by virtue of the essential supremum, which takes the 
"worst possible X's before the change." In our theorems we can also use 
the average detection delay, proposed by Shiryayev [25] and Pollak [19], 
sup 0ge (sup jy>1 Eg X (N — u\N > u)), which is asymptotically equivalent to 



If and A are simple, say = {9} and A = {A}, Page's CUSUM proce- 
dure is defined by 



where the notation is used to emphasize that the pre-change distribution 
is fg. Moustakides [16] and Ritov [23] showed that Page's CUSUM procedure 
Tqm (G, a) is exactly optimal in the following minimax sense: For any a > 0, 
Tcm(9,cl) minimizes E^A^ among all stopping times N satisfying EgN > 



E\N = sup(esssupE^[(iV -u + l)+\X u .. .,!„_!]). 



E X N. 



(1.1) 
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EgTcM^i a )- Earlier Lorden [14] proved this property holds asymptotically. 
Specifically, Lorden [14] showed that for each pair (6, A) 

— loeEfliV 

(1.2) EAW > (1+0(1)) _y_, 

as ~EqN — > oo and Tcm(9,cl) attains the lower bound asymptotically. Here 
I(\,9) = ~E\log(g\(X) j fg(X)) is the Kullback-Leibler information number. 
This suggests defining the asymptotic efficiency of a family {N(a)} as 

(1.3) e(M)=liminf ]^L N ^\ , 
V ; K J I(X,e)B x N{a) 

where {N(a}} is required to satisfy EgN(a) — > oo as a — ► oo. Then e(8, A) < 1 
for all families, so we can define: 

Definition 1.1. A family of stopping times {N(a)} is asymptotically 
efficient at (6, A) if e(6, A) = 1. 

It follows that Page's CUSUM procedure TcM{9,a) for detecting a change 
in distribution from fg to g\ is asymptotically efficient at (9,X). However, 
7cm (9, a) i n general will not be asymptotically efficient at (#', A) if 9' ^ 9; 
see Section 2.4 in [31], equation (2.57) in [28] and Table 1 in [5]. 

2. Simple post-change hypotheses. It will be assumed in this section 
and only in this section that fg and #a = fx belong to a one-parameter 
exponential family 

(2.1) fe(x)=exp(£x-b(£)), -oo < x < oo, £ e O, 

with natural parameter space Q = (£, f) with respect to a cr-finite measure 
F. Then &(£) is strictly convex on £1. Assume that = [#o, 9\] is a subset of 
£1, and A is a given value outside the interval 0, say A > 9\. In this section 
we consider the problem of detecting a change in distribution from fg for 
some 9 G to fx and we want to find a stopping time N such that EgN is 
as large as possible for each 9 € = [0o>$i] subject to the constraint 

(2.2) B X N < 7 , 

where 7 > is a given constant and A €" 0. 

One cannot simultaneously maximize E#iV for all 9 £ subject to (2.2) 
since the maximum for each 9 is uniquely attained by Page's CUSUM pro- 
cedure Tcm(9,o) in (1.1). As one referee pointed out, if one wants to maxi- 
mize infg^e EgA^ subject to (2.2), then the exactly optimal solution is Page's 
CUSUM procedure 7cm a) for detecting a change in distribution from 
fdi to fx- This is because inig e Q~EgN < Eg 1 N with equality holding for 
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N = Tcm(^Ijo)) which maximizes ~Ei$ 1 N among all stopping times N satis- 
fying E^A" < E^Tcm (#1 > a) • In other words, this setup is equivalent to the 
simplest problem of detecting a change in distribution from fg 1 to f\. 

In this section, rather than be satisfied with just inf^e E#iV, a lower 
bound on Eg A" over 9 E 0, we want to maximize Eg A" asymptotically for 
each 9 £ B as 7 — > 00, or equivalently, to find a family of stopping times that 
is asymptotically efficient at (9, A) for every 9 S = [6>o,#i]. 

Before studying change-point problems in Section 2.2, we first consider 
the corresponding open-ended hypothesis testing problems in Section 2.1, 
since the basic arguments are clearer for hypothesis testing problems and 
are readily extendable to change-point problems. 

2.1. Open-ended hypothesis testing. Suppose X\,X2, ■ ■ ■ are independent 
and identically distributed random variables with probability density of 
the form (2.1) on the natural parameter space f2 = (£,£). Suppose we are 
interested in testing the null hypothesis 



where £ < 9 < 9i < A < £. 

Motivated by applications to change-point problems, we consider the fol- 
lowing open-ended hypothesis testing problems. Assume that if Hq is true, 
sampling costs nothing and our preferred action is just to observe X\, X2, ■ ■ ■ 
without stopping. On the other hand, if Hi is true, each observation costs 
a fixed amount and we want to stop sampling as soon as possible and reject 
the null hypothesis Hq. 

Since there is only one terminal decision, a statistical procedure for an 
open-ended hypothesis testing problem is defined by a stopping time N. The 
null hypothesis Hq is rejected if and only if < 00. A good procedure Af 
should keep the error probabilities ~Pq(N < 00) small for every 9 6 O while 
keeping E^A^ small. 

The problem in this subsection is to find a stopping time Af such that 
Pg(N < 00) will be as small as possible for every 9 G © = [9q, 9±] subject to 
the constraint 



where 7 > is a given constant. 

For each 9 G 0, by [32], the minimum of ~Pg(N < 00) is uniquely attained 
by the one-sided sequential probability ratio test (SPRT) of Hq q : £ = 9 ver- 
sus Hi : £ = A, which is given by 



H o :£ee=[0o,0 1 ] 



against the alternative hypothesis 



fli:£€A = {A}, 



(2.3) 



B X N < 7 
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In order to satisfy (2.3), it is well known that Cg ~ I(X,9)'y; see, for ex- 
ample, page 26 of [28]. A simple observation is that the null hypothesis is 
expressed as a union of the individual null hypotheses, Hq q : £ = 9, and so 
the intersection-union method (see [2]) suggests considering the stopping 
time 

(2.4) M(a) = mil n > 1 : V log { A ^j > /(A, 9)a for all 6 < 9 < 9 1 1 . 

JeyXi) 



i=l 



The rationale is that Hq can be rejected only if each of the individual null 
hypotheses Hq^ '-£, = 9 can be rejected. 

In order to study the behavior of M(a), it is useful to express M(a) in 
terms of S n = X\ + • • • + X n . Define 

(2.5) 



X-9 

Then by (2.1), the stopping time M(a) can be written as 

(2.6) M(a)=mf(n>l:S n >b'(X)a + sup [(n-a)^(fl)]} 

because A > 9\. Now (j)(9) is an increasing function since b{9) is convex, thus 
the supremum in (2.6) is attained at 9 = 9q if n < a, and at 9 = 9\ if n > a. 
Therefore, M(a) is equivalent to the simpler test which uses two simulta- 
neous SPRTs (with appropriate boundaries), one for each of the individual 
null hypotheses 9q,9\. This fact makes it convenient for theoretical analysis 
and numerical simulations. 

The following theorem, whose proof is given in Section 5, establishes the 
asymptotic properties of M{a) for large a. 

Theorem 2.1. For any a > and all 9 <9<9i 

(0 ~ |logP e (M(a)<oo)| 
[1) I(X,9) 

and as a — ► oo 

(2.8) E A M(a) = a + (C + o(l)) v / ^, 

where 



X-9, X-9 \ b"(X) 

(2 - 9) C " ( 7(AA)-7(AA)JV^r >a 



The following corollary establishes the asymptotic optimality of M{a). 
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Corollary 2.1. Suppose {N(a)} is a family of stopping times such 
that ~E\N(a) < E A M(a). For all 8 <8<8 1 as a -> oo, 

"° gF /aS <c °" -° +(c+ ° (i>)vs - 

where C is as defined in (2.9). 27ms M(a) asymptotically minimizes the 
error probabilities P$(N < oo) /or every 8 G = [0o,0i] among all stopping 
times N such that E X N < E A M(a). 

Proof. The corollary follows directly from Theorem 2.1 and the well- 
known fact that 

|logP e (iV(a)<oo) 



I(X,8) 

for all 0G [#oA]- □ 



< E A iV(a) 



2.2. Change-point problems. Now let us consider the problem of detect- 
ing a change in distribution from fg for some 8 G G = [0o,0i] to f\. As 
described earlier, we seek a family of stopping times that is asymptotically 
efficient at (8, A) for every 8 G 0. 

A method for finding such a family is suggested by the following result, 
which indicates the relationship between open-ended hypothesis testing and 
change-point problems. 

Lemma 2.1 (Lorden [14]). Let N be a stopping time with respect to 

X\,X2, For k = 1,2,..., let Nk denote the stopping time obtained by 

applying N to Xk,Xk+i, . . . for k = 1,2, ... , and define 

N* = mm(N k + k- 1). 

k>l 

Then N* is a stopping time with 

E e N*>l/P e (N <oo) and B X N* < E X N 
for any 8 and A. 

Let M(a) be the stopping time defined in (2.4), and let Mfc(a) be the 

stopping time obtained by applying M(a) to the observations X^jX^+i, 

Define a new stopping time by M*{a) = minfc>i(Mfc(a) + k — 1). In other 
words, 

(2.10) M*(a)=inf|n>l: max inf [ V log - 0)a ) > 1. 

1 l<*<nflb<0<*i\~ /fl(Xi) / J 

The next theorem establishes the asymptotic performance of M*(a), which 
immediately implies that the family {M*(a)} is asymptotically efficient at 
(0, A) for every 8 G 0. 
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Theorem 2.2. For any a > and Qq < 9 < 9\, 

(2.11) B e M*{a) >exp(I(A,0)a), 
and as a ^ oo, 

(2.12) E A M*(a) <a+(C + o(l))>/o, 

where C is as defined in (2.9). Moreover, if{N(a)} is a family of stopping 
times such that (2.11) holds for some 9 with N(a) replacing M*(a), then 

(2.13) E A iV(a) > a + O(l) as a -> oo. 

PROOF. Relations (2.11) and (2.12) follow at once from Theorem 2.1 and 
Lemma 2.1. Relation (2.13) follows from the following proposition, which 
improves Lorden's lower bound in (1.2). □ 

Proposition 2.1. Given 9 and X^9, there exists an M = M(9,X) > 
such that for any stopping time N , 

(2.14) logE fl iV< J(A,0)Ea-/V + M. 

Proof. By equation (2.53) on page 26 of [28], there exist C\ and C 2 
such that for Page's CUSUM procedure Tcm(9,o) in (1.1), 

B e Tcu(9,a)<C 1 e a and /(A, 9)E X T CM {9, a) > a - C 2 

for all a > 0. For any given stopping time N, choose a = log Eg N — logCi; 
then E e N = de a > E e T C u{9, a). The optimality property of T C m(9, a) [16] 
implies that 

/(A, 9) logE A iV > /(A, 9) logE A Tc M (#, a) 

>a-C 2 = \ogE e N-\ogCi-C 2 . □ 

The following corollary follows at once from Theorem 2.2. 

Corollary 2.2. Suppose {N(a)} is a family of stopping times such 
that 

B x N(a)<B x M*(a). 
Then for all 9q < 9 < 9\, as a ^ 00, 
logEgMa) ^ 

-X_p< a + (£7 + o(l))Va, 

where C is as defined in (2.9). Thus, as a — > 00, M*(a) asymptotically max- 
imizes logEAiV [up to 0(y / a)] for every 9 £ [#cb$i] among all stopping times 
N such that B X N < E A M*(a). 
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Remark. The 0(yfa) terms are the price one must pay for optimality 
at every pre-change distribution fg. 

In order to implement the stopping times M*(a) numerically, using (2.6), 
we can express M*(a) in the following convenient form: 

M>) = inf In >1: max £ log ^jFT > I(Wo)a, 

(2-15) 

orW- n _ 6 + J2 log^^r>/(A,0i)a), 

where b=[a],W k = max{W fc _i, 0} + log(/ A (X fc )// 9l (X k )) and W = 0. Since 
W k can be calculated recursively, this form reduces the memory requirements 
at every stage n from the full data set {X\, . . . ,X n } to the data set of size 
6 + 1, that is, {Xn-b, ^n-b+i> • • • i X n }. It is easy to see that this form involves 
only 0(a) computations at every stage n. 

As an Associate Editor noted, there are other procedures that can have 
the same asymptotic optimality properties as M*(a). For example, if we 
define a slightly different procedure Mf(a) by switching inig < : g<g 1 with 
maxi<fc< n in the definition of M*(a) in (2.10), or if we define M|(a) = 
sup eo < e < e jT CM (M(A,#)a)}, where T CM (0,I(\,9)a) is Page's CUSUM pro- 
cedure for detecting a change in distribution from fg to fx with log-likelihood 
ratio boundary I(\,9)a, then both M*(a) and M|(a) are well-defined stop- 
ping times that are asymptotically efficient at (6, A) for every 9 G 0. How- 
ever, both M*(a) and M|(a) are difficult to implement, although one can 
easily implement their approximations which replace G = [9q, 9\] by a (prop- 
erly chosen) finite subset of 0. 

It is important to emphasize that in all the above procedures we should 
choose appropriate stopping boundaries. Otherwise the procedures may not 
be asymptotically efficient at every 9 £ G. For instance, motivated by the 
generalized likelihood ratio method, one may want to use the procedure 

rjii / \ ' f f ' — 1 i 9\{X k ) ■ ■ ■ g\(X n ) 

1 (a) = ml < n > 1 : max log — — > a 

I l < k < n sup eo < e < 01 (/ (A fc ) ■ • ■ fe{X n )) 



inf < n > 1 : max inf 

l<fc<n.6»o<0<6»i 



(x-9)j2(Xi-m) 

i=k 



>a , 



where (f)(9) is defined in (2.5). Unfortunately, for all a > 0, T'(a) is equivalent 
to Page's CUSUM procedure Tqwi(Qi,gl), and thus it will not be asymptot- 
ically efficient at every 9. To see this, first note that T'(a) >Tcm(^1jo) 
by their definitions. Next, if Tcm(^1) q ) stops at time no, then for some 
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1 < k < n , EZk ( X i ~ <K0i)) > a /( X " e i) since A > °i- Thus > if a > 0, 
then for all o <0<0i(<\), 

no no 

E (*« - m) > J2 & - m)) > j^- > 

i=ko i=ko 

because (f)(0) is an increasing function of 9. This implies that T'(a) stops be- 
fore or at time no and so T'(a) < Tcm(0i,o)- Therefore, T'(a) = Tcm(#i,o)- 
Similarly, if one considers T"(a) = swp g <0< g 1 {TcM(0, then T"(a) is 
also equivalent to Tcm(0i,cl), because for all a > 0, Page's CUSUM pro- 
cedure TcM(0, a ) is increasing as a function of £ [#0j$i] hi the sense that 
T C M(0,a)<T CM (0',a) if < 0'. 

2.3. Extension to half-open interval. Suppose Xi,X2,... are indepen- 
dent and identically distributed random variables with probability density 
/e of the form (2.1) and suppose we are interested in testing the null hy- 
pothesis 

ff :£ee = (£A] 

against the alternative hypothesis 

H 1 :£eA = {\}, 

where 0\ < A. Recall that f2 = (£,£) is the natural parameter space of £. 
Assume 

(2.16) limE e X = -oo. 

This condition is equivalent to ]hRg^b'(6) = — oo since b'(0) =E$X. Many 
distributions satisfy this condition. For example, (2.16) holds for the normal 
distributions since Eg AT = 9 and £ = — oo. It also holds for the negative 
exponential density since b(0) = —log0,£, = and EqX = b'(0) = —1/0. 

As in (2.4), our proposed open-ended test M(a) of Hq:^ G = (£, 0±] 
against H\ : £ = A is defined by 

M(a) = inf J n > 1 : V log I^l^A > /(A, 9)a for all £ < 9 < 6>i 1 . 
As in (2.6), M(a) can be written as 

(2.17) M(a) =inf{ n > 1:VJ*Q> 6' (A)o+ sup [(n-a)0(0)]l, 

I i=i |<e<«i J 

where 0(0) is defined in (2.5). By L'Hopital's rule and the condition in (2.16), 

lim (f)(9) = lim - jff = lim &'(0) = lim E e X = -oo. 
6» — A - 
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Thus for any n < a, Y^}=i^i ls finite but sup^ <e<9l [(n — a)<f)(0)] = oo. So 

M(a) will never stop at time n < a. Recall that (f)(9) is an increasing function 
of 9, hence the supremum in (2.17) is attained at 9 = 9\ if n > a. Therefore, 

M(a) = inf \n > a : £ log M£L > I{\ Oi)a). 

For the problem of detecting a change in distribution from some f$ with 
G = (|,6>i] to fx, define M*(a) from M(a) as before, so that 

M*(a) = inf j n> a: max Vlog ( X ^\ >I(\,9 1 )a\. 

1 l<fe<n-a+l^ J 

Using arguments similar to the proof of Theorem 2.2, we have: 

Theorem 2.3. For a > and 9 e (£,0i], 

E 9 M*(a) >exp(I(A,0)a), 



and as a — > oo, 



where 



E A M*(a) <a + (C + o(l))v^, 



C -/(AA)V 2vr >0 - 

Thus the analogue of Corollary 2.2 holds, and so M*(a) asymptotically 
maximizes logEgiV [up to 0(^/a)] for every 9 E among all stopping 

times N such that E A iV < E A M*(a). 

Table 1 

Long ARL for different procedures 






Best possible 


M*(a) 


T CM (-0.5,a) 


TcM(-1.0,a) 






(a = 18.50) 


(a = 2.92) 


(a = 9.88) 


-0.5 


233 ±7 


206 ±6 


233 ±7 


125 ±3 


-0.6 


523 ±15 


501 ±15 


518±15 


297 ±8 


-0.7 


1384 ±43 


1324 ±43 


1227 ±37 


938 ±29 


-0.8 


5157±165 


4688 ±148 


3580 ±113 


4148 ±129 


-0.9 


22,942 ±699 


19,217±606 


10,613 ±343 


21,617±658 


-1.0 


118,223 ±3711 


83,619 ±2566 


31,641 ±1036 


118,223 ±3711 



(The best possible values are obtained from an optimal envelope of Page's CUSUM pro- 
cedures.) 



DETECTION WITH COMPOSITE PRE-CHANGE 



13 



2.4. Numerical examples. In this subsection we describe the results of 
a Monte Carlo experiment designed to check the insights obtained from 
the asymptotic theory of previous subsections. The simulations consider the 
problem of detecting a change in a normal mean, where the pre-change 
distribution fg = N(8, 1) with 9 G = [—1, —0.5], and the post-change dis- 
tribution fx = N(X, 1) with A G A = {0}. 

Table 1 compares our procedure M*(a) and two versions of Page's CUSUM 
procedure Tcm(#o>o) over a range of 9 values. Here 



7cm (#0, a) = inf< n > 1 : max V log - > a 



i) 



inf<n>l: max / y (~#o) 



Kk<n 



2 



>a 



The threshold value a for Page's CUSUM procedure Tcm(0o 5 «) and our 
procedure M*{a) was determined from the criterion E^A^ ~ 20. First, a 
10 4 -repetition Monte Carlo simulation was performed to determine the ap- 
propriate values of a to yield the desired detection delay to within the range 
of sampling error. With the thresholds used, the detection delay E^A" is 
close enough to 20 so that the difference is negligible, that is, correcting the 
threshold to get exactly 20 (if we knew how to do that) would change EgN 
by an amount that would make little difference in light of the simulation 
errors E#A" already has. Next, using the obtained threshold value a, we ran 
1000 repetitions to simulate long ARL, EgN, for different 9. 

Table 1 also reports the best possible Eg A" at each of the values of 6 sub- 
ject to E^iV ~ 20. Note that they are obtained from an optimal envelope of 
Page's CUSUM procedures and therefore cannot be attained simultaneously 
in practice. Each result in Table 1 is recorded as the Monte Carlo estimate 
± standard error. 

Table 1 shows that M*(a) performs well over a broad range of 6, which 
is consistent with the asymptotic theory of M*(a) developed in Sections 
2.2 and 2.3 showing that M*(a) attains [up to 0{^fa)} the asymptotic upper 
bounds for logE^A^ in Corollary 2.2 as a — > oo. 

3. Composite post-change hypotheses. Let © and A be two compact 
disjoint subsets of some Euclidean space. Let {fe',0 G 6} and {g\;\ G A} 
be two sets of densities, absolutely continuous with respect to the same 
nondegenerate cr-finite measure. In this section we are interested in detecting 
a change in distribution from fg for some 9 G to g\ for some A G A. Here 
we no longer assume the densities belong to exponential families, and we 
assume that both and A are composite. 

Ideally we would like a stopping time N which minimizes the detection 
delay E^A r for all A G A and maximizes E^A^ for all 9 G 0, that is, we seek a 
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family {N(a)} which is asymptotically efficient for all (9,X) G x A. How- 
ever, in general such a family does not exist. For example, for A = {Ai, A2} 
it is easy to see from (1.3) that there exists a family that is asymptotically 
efficient at both (0,Ai) and (9,X 2 ) for all 9 G only if I(X 2 ,9)/I(X 1 ,9) is 
constant in 9 G G. This fails in general when is composite. For example, if 
fg and g\ belong to a one-parameter exponential family and is an interval, 
a simple argument shows that I(X 2 ,9)/I(Xi,9) is a constant if and only if 
Ai = A2. 

It is natural to consider the following definition: 

Definition 3.1. A family of stopping times {N(a)} is asymptotically 
optimal to first order if: 

(i) for each G 0, there exists at least one Xg G A such that the family 
is asymptotically efficient at (9,Xg); and 

(ii) for each A G A, there exists at least one 9\ G such that the family 
is asymptotically efficient at (9\,X). 

Remark. An equivalent definition is to require that the family {N(a)} 
is asymptotically efficient at (hi(5), h 2 (5)) for 5 G A, where 9 = h\(5) and 
X = h,2(5) are onto (not necessary one-to-one) functions from A to and A, 
respectively. It is obvious that the standard formulation with simple and 
our formulation in Section 2 are two special cases of this definition. 

Remark. It is worth noting that a family of stopping times that is 
asymptotically optimal to first order is asymptotically admissible in the 
following sense. A family of stopping times {N(a)} is asymptotically inad- 
missible if there exists another family of stopping times {N'(a)} such that 
for all 9 G and all A G A, 

v }ogEgN(a) . B x N(a) 
limsup — — < 1 and limmf = > 1, 

a^oo logE e N'(a) - a^oo E x N f (a)~ 

with strict inequality holding for some 9 or A. A family of stopping times is 
asymptotically admissible if it is not asymptotically inadmissible. 

Note that when A = {A} is simple, the asymptotically optimal procedure 
developed in Section 2 satisfies 

(3.1) logE iV(a)~ I(X,9)a as a 00. 

Here and everywhere below, x(a) ~ y(a) as a — > 00 means that lim a ^ 00 (x(a) / 
y(a)) = 1. However, when one considers multiple values of the post-change 
parameter A it is no longer possible to find a procedure such that (3.1) holds 
for all (9, A) G x A. A natural idea is then to seek procedures such that 

logE e N(a) ~p(9)a, 
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where p(9) is suitably chosen. It turns out that for "good" choices of p{9) 
one can define {N(a)} to be asymptotically optimal to first order. 
To accomplish this, first consider the following definitions. 

Definition 3.2. A positive continuous function p(-) on is an opti- 
mizer if for some positive continuous q(-) on A 

. I(X,9) 
AeA q(\) 

Similarly, q(-) on A is an optimizer if for some positive continuous p{-) on O 

q{X) = mf — — — . 
eee p(9) 

Definition 3.3. Positive continuous functions p(-),q(-) on O, A, respec- 
tively, are an optimizer pair if for all 9 and A 

(3.2) p (0) = jtf ari d g(A) = inf 

agA q(X) y eee p(9) 

The following proposition characterizes the relation between these two 
definitions. 

Proposition 3.1. If (p,q) is an optimizer pair, then p and q are opti- 
mizers. Conversely, for every optimizer p, there is a q such that (p, q) is an 
optimizer pair, namely, 

q(\) = mf — — — 
eee p(9) 

and, similarly, for every optimizer q one can obtain an optimizer pair (p, q) 
by defining 

AeA q(X) 

Proof. It is obvious that p and q are optimizers if (p, q) is an optimizer 
pair. Since everything is symmetric in the roles of p and q, we only need to 
prove that the first equation of (3.2) holds for the case where q is defined 
after p. Now fix #o G 0. On the one hand, since q(\) is defined as the infimum 
over 0, we have q(X) < I(X 1 9 )/p(9 ), so p(9 ) < I(X,9 )/q(X) for all A G A. 
Thus 



(3.3) 



16 



Y. MEI 



On the other hand, since p is an optimizer by assumption, there exists a 
function qo(-) on A such that 



p{9) = inf 



i(\,e) 



AeA q (X) 

For any Ao G A, we have p{6) < I{Xo,6)/qo(Xo) and so I(Xq,9)/p(9) > go(Ao) 
for all 9 G 0. Hence 

. J(A ,fl) ^ 

ml — — — > go(Ao)- 

0ee p(6>) 

Observe that the left-hand side is just our definition for (/(Ao), and so g(Ao) > 
go(Ao)- Since Ao is arbitrary, we have q(X) > qo(X) for all AG A. Thus, 

. J(A,fl ) . Z(A,fl ) 

mi — — — < mf — — =p{9 ) 

AeA g(A) AeA q (X) 

by using the definition of p{9). The first equation of (3.2) follows at once 
from this and (3.3). □ 



In fact, Proposition 3.1 provides a method to construct optimizer pairs. 
One can start with any positive continuous function go (A), get an optimizer 
p{9) from it by (3.2) and use the other part of (3.2) to get a (p, q) optimizer 
pair. Similarly, one can also get a (p, q) optimizer pair by starting with a 
Po(9). 

Now we can define our proposed procedures based on an optimizer p{9). 
First, let r\ be an a priori distribution fully supported on A. Define an open- 
ended test T(a) by 



(3.4) T(a) = infi n : inf 



1 



p{9) 



log 



/ A [g A (X 1 )---g A (X TO )]r ? (dA) 
fe(X 1 )---f e (X n ) 



> a 



Then our proposed procedure is defined by T*(a) = minfc>i(Tfc(a) + k — 1), 
where J\(a) is obtained by applying T(a) to Xj^X^i, Equivalently, 



T*(o)=inf^n>l: 



(3.5) 



max inf 

i<fc<n0ee 



i log IMx k 



■ g x (X n )]ri(dX) 



P(0) 



fe{Xk) ■ ■ ■ fe(Xn 



> a 



We also define a slightly different procedure T*(a) by switching infgge with 
maxi<fc< n in the definition of T*(a). 

Our main results in this section are stated in the next theorem and its 
corollary, which establish the asymptotic optimality properties of T*(a) and 
T-f (a). The proofs are given in Section 3.1. 
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Theorem 3.1. Assume that Assumptions Al and A2 below hold and 
and A are compact. If p{6) is an optimizer, then {T*(a)} and {T*(a)} 
are asymptotically optimal to first order. 

Corollary 3.1. Under the assumptions of Theorem 3.1, if{N(a)} is 
a family of procedures such that 

limsup j^ A ^( a ) < i f or a n A g A, 
a->oo E\T*(a) ~ 



then 



Similarly, if 



then 



logE e N(a) 

limsup — — < 1 tor all G fc). 

a-oo logE fl T*(a) - 



. logB e N(a) lia ^r\ 
hmml - — — — — y-{ > 1 for all fGo. 



lim inf _ ^ ; > 1 for all A G A. 
a^oo E A T*(a) ~ 

The same assertions are true ifT*(a) is replaced by (a). 

Remark. Corollary 3.1 shows that our procedures T*(a) and T-j*(a) are 
also asymptotically optimal in the following sense: If a family of procedures 
{N(a)} performs asymptotically as well as our procedures (or better) uni- 
formly over G, then our procedures perform asymptotically as well as { N(a)} 
(or better) uniformly over A, and the same is true if the roles of O and A 
are reversed. 

Remark. Theorem 3.1 and Corollary 3.1 show another asymptotic op- 
timality property of our procedures T*(a) and T*(a): If the optimizer p(9) 
is constructed from qo(X) by the first equation of (3.2), then our procedures 
asymptotically maximize ~EqN for every 9 G among all stopping times N 
satisfying 

q (X)B x N<j for all AG A, 

where 7 > is given. Here qo(X) > can be thought of as the cost per 
observation of delay if the post-change observations have distribution g\. 

Remark. Instead of T(a) in (3.4), we can also define the following stop- 
ping time in open-ended hypothesis testing problems: 



(3.6) f (a) = inf in > 1: inf 
t flee 



1 log SUPA ^ A ^ " ' 9X ( Xn ^ 



P (9) fe(Xi)---f 6 (X n ) 



>a 
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and then use it to construct the corresponding procedures in change-point 
problems. When fg and g\ are from the same one-parameter exponential 
family, we can obtain an upper bound on Pg(T(a) < oo) by equation (13) 
on page 636 in [15], and so we get a lower bound on the long ARL. The 
upper bound on detection delay follows from the fact that T(a) < T{a). 
These procedures are, therefore, also asymptotically optimal to first order if 
fg and g\ belong to one-parameter exponential families. 

Remark. Note that if p(9) = 1, then all of our procedures are just based 
on generalized likelihood ratios. However, in the case where p(9) = 1 is not an 
optimizer, generalized likelihood ratio procedures may not be asymptotically 
optimal to first order. In fact, they are asymptotically inadmissible since 
they are dominated by our procedures based on an optimizer p(9) which is 
obtained by starting with po(9) = 1. 

Throughout this section we impose the following assumptions on the den- 
sities fg and g\. 

Assumption Al. The Kullback-Leibler information numbers I(X,9) = 
E\log(g\(X) / fg(X)) are finite. Furthermore: 

(a) / = inf A inf e /(A,fl)>0, 

(b) I (A, 9) and /(A) = mig I(\,9) are both continuous in A. 

Assumption A2. For all 9, A: 

(a) E A [log( 5A (X)//,(A))] 2 <cx), 

(b) lim p ^ E A [logsup| e ,_ e |< p / e /(A) - log/ e (A)] 2 = 0, 

(c) lim A ^ A E A [log 5A ,(X) -\ogg x (X)} 2 = 0. 

Assumptions Al and A2 are part of the Assumptions 2 and 3 in [7]. 
Assumption Al(a) guarantees that O and A are "separated." 

3.1. Proof of main results. First we establish the lower bound on the 
long ARLs of our procedures T*(a) and T-f (a) for any arbitrary positive 
function p(9). 

Lemma 3.1. For all a > and 9 £ 0, 

logE e T*(a) > logEeT^a) >p(9)a. 

Proof. Define 

.(»,.) - inf{„ > 1 : JL log h^^MMm > A 
I P(0) fe(Xi) ■ ■ ■ fg(X n ) J 
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and t*(6, a) = mm/->i a) + k — 1), where tk(9,a) is obtained by apply- 
ing t(0,a) to Xk,Xk+i, Then it is clear that T*(a) > 7i*(a) > t*(9,a), 

and hence 

E e T>) > E^* (a) > E e [i* a)] • 
Using Lemma 2.1 and Wald's likelihood ratio identity, we have 

1 



E e [t*(0,a)]> 
which proves the lemma. □ 



Po(t(6,a) < oo) 



> exp(p(9)a) 



Next we derive an upper bound on the detection delays of our procedures 
T*(a) and Tf(a). 

Lemma 3.2. Suppose that Assumptions Al and A2 hold and Q is com- 
pact. If p(9) is a positive continuous function (not necessarily an optimizer) 
on Q, then for all AG A, 

E A T 1 *(a)<E A T*(a)<(l + o(l))- ° 



9(A) 



as a — > oo, where q(X) is defined by 

q(X) = inf 



I(\9) 
fee p{9) 

Proof. By definition, E\Tf{a) < E A T*(a) < E A T(a), where T(a) is de- 
fined in (3.4), so it suffices to show that 

a 



E A T(a) < (l + o(l))- 



'(A) 

for any AG A. We will use the method in [7] to prove this inequality. Fix 
Ao G A and choose an arbitrary e > 0. By Assumptions Al and A2, the 
compactness of and the continuity of p(0), there exist a finite covering 
{Ui, 1 < i < k £ } of (with 9i G Ui) and positive numbers 6 e such that for all 
A G = {A | |A — Aq I < 6 e }, and i = 1, . . . ,k e , 



(3.7) 
and 



E 



An 



loggx(X) - log sup / 9 (A) 

B&Ji 



>I{\oA)-e 



sup |p(0) -p(5i)| <£. 



Let N±(a) be the smallest n such that 

(3.8) log/ [ 9A (Ai)--- 5A (A n )MdA)>sup 

JVe see 



p(0)a + £>g/ fl (X,- 
j'=i 
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Clearly iVi(a) > T(a). By Jensen's inequality, the left-hand side of (3.8) is 
greater than or equal to 



J log[g x (X 1 )---g x (X n )}^+log V (V E ) 



(3.9) 



V(Ve. 



gX £ ^^^)|i-|lo g ,(K)| 



since rj(V e ) < 1. Since {C/j} covers 0, the right-hand side of (3.8) is less than 
or equal to 



max sup 

i<i<k e B£U . 



p(9)a + ^ogfe(X j ] 
j'=i 



(3.10) 



< max 

KKL 



< max 

KKL 



(p(0i) + e)a + sup ^ \ogf e (X 



eeu.. 



For j = 1,2, ... , put 
Yj= [ loggxiXA 



Let N2(a) be the smallest n such that 



ij=l 

n 

(p(9i) + e)a + Y / log sup f g (Xj) 
j= i SeUi 



and Zj = log sup fo(Xj) for i = 1, 



> fee- 



max 

<j<fc e 



>|logr/(K)| 



£zj + (p(0i) + e)a 
-i=i 

or, equivalently, the smallest n such that for all 1 < i < k e , 

log 7/(14) | 



n \r. 71 



1 + 



p(0< 



+ 



Using (3.9) and (3.10), it is clear that iV^ct) > Ni(a). Let po = inf# g 0p(0); 
then po> since p(9) is a positive continuous function and G is compact. 
Define r e = | logr](y e )\/po, and let Ns(a) be the smallest n such that 



n v 7 l 

■ \ ^ Y i ~ L j 
mm > — ; — - ± 

i<i<k e f^ p(0i) 



>a[l + — )+T £ 
Po, 



or, equivalently, 

f\ Y 3- Z ) 

.7 = 1 



P(01 



+ min > 

i<i<k s r-( 



p(*i) p(#i) 



+ e 



>C 1 + — )+T £ . 

Po, 
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Clearly N 3 (a) > N 2 (a). From (3.7) we have 



For n = 1, 2, . . . define 



> 



p(0i) 



e 1 + 



Po 



for z = 1 , 



5n = E 



^-4 
p(0i) 



and 



+ e 



for i = 1, 



p(0i) P(6i) 
Let N*(a) be the smallest n such that, simultaneously, 



j he- 



S n > a[ 1 + — ) + r e and 



min jB' > 0. 

Ki<fc 



Clearly, N*(a) > N^(a). Now it suffices to show that 



(3.12) 



E A() iV»<(l + r e )- 



'(A ) 



for all sufficiently large a for some r e > which can be made arbitrarily 
small by choosing a sufficiently small e. 

To prove (3.12), assume that {Ui\ are indexed (re-index if necessary) so 
that the minimum (over i) of the left-hand side of (3.11) occurs when i = 1. 
By the proof of Lemma 2 in [7] , we have 



(3.13) 
where 



E Ao iV* (a) < E Ao (v! ) + E Ao (v+)E Xo (w) , 



vi = inf|n:5 n > a^l + —J +r e r 

v + = inf{n : S n > 0}, 
w = last time min B l „ < 0. 



l<i<fc £ 

By (3.11) and the definition of q(X), 



E 



^-4 

P(0i) 



£ 



>q(X )-e 1 + 



Po 



Thus, if we choose e small enough so that g(Ao) — e(l + 1/po) > 0, then it is 
well known from renewal theory that 

E Ao M< (1 + 0(1)) ^\ + £/ ^] + ^\ and = 0(e) <oo. 

<?(Ao) - e(l + 1/po) 
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Moreover, E\ (w) = h(e) < oo because the summands in B % n have positive 
mean and finite variance under Pa ; see > f° r example, Theorem D in [7]. 
Relation (3.12) follows at once from (3.13). Therefore, the lemma holds. □ 



Proof of Theorem 3.1 and Corollary 3.1. First we establish an 
upper bound of log EgT*(a). By Lemma 3.2 and Lorden's lower bound (1.2), 

logE,T*(a) < inf ((l + O (l))/(A,0)E A T*(a)) < ini [( (1 + o(l))7(A, 9)-^- 
AeA AeA \ 9(A), 

The compactness of A leads to 

i(\,sy 



W(„)<(1 + <,(!)) (Jnf q[x) 

If p(9) is an optimizer, then (p(9),q(X)) is an optimizer pair by Proposition 
3.1. Thus 

lo g E fl T*(a)<(l + o(l))p(0)a. 
Combining this with Lemma 3.1 yields 

logE T*(a) ~p(0)o. 

Similarly, 

E x T*(a)~ a /q(\), 

and the same results are true if T* (a) is replaced by T* (a) . 

To prove Theorem 3.1, note that the asymptotic efficiency of T*{a) and 
Tf(o) at (0,X) is 

(ft x\ P( g )g( A ) 

e(0 ' A) = T(W 

and so they are asymptotically optimal to first order by virtue of the com- 
pactness of 6 and A and the definition of an optimizer pair. 

Applying Lorden's lower bound, we can prove Corollary 3.1 in the same 
way as the upper bound for logEeT*(a). □ 

3.2. Optimizer pairs. The following are some examples of an optimizer 
pair (p, q) and the corresponding asymptotically optimal procedures. 

Example 3.1. If there exists Iq such that for all 6 e 0, inf AeA /(A, 0) = 
J , then <?o(A) = Iq yields 

p(9) = l and q(X) = mU(X,e). 

This is even true for composite and A. In particular, if is simple, say 
{9q}, then our consideration reduces to the standard formulation where the 
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pre-change distribution is completely specified. Moreover, Pollak [18] proved 
that T(a), defined in (3.4), has a second-order optimality property in the 
context of open-ended hypothesis testing if fg and g\ belong to exponential 
families. 

Example 3.2. If there exists Io such that for all AG A, infgge I(X, 9) = 
Iq, then go (A) = 1 yields 



even for composite and A. In particular, if A is simple, say {A}, then the 
considerations of Section 3 reduce to those of the problem in Section 2. 

Example 3.3. Suppose fg and g\ are exponentially distributed with 
unknown means 1/9 and 1/A, respectively. Assume = {9:9 G [#o,#i]} 
and A = {A: A € [Ao,Ai]}, where 9q < 9\ < Ao < X\. Then optimizer pairs 
(p(9),q(X)) are not unique. For example, the following two pairs are nonequiv- 
alent: 



Suppose tl(a) and ^(a) are the procedures defined by (3.5) for the pairs 
(pi(9),qi(X)) and (p2(9), 52(A)), respectively. Even though both i*(a) and 
t\(a) are asymptotically optimal to first order, t\(a) performs better uni- 
formly over (in the sense of larger long ARL), while ijjfa) performs better 
uniformly over A (in the sense of smaller short ARL). 

3.3. Numerical simulations. In this section we report some simulation 
studies comparing the performance of our procedures with a commonly used 
procedure in the literature. 

The simulations consider the problem of detecting a change in distribution 
from fg to g\ , where fg and g\ are exponentially distributed with unknown 
means 1/9 and 1/A, respectively, and 9 G = [0.8, 1] and A G A = [2,3]. 

Note that qo(X) = 1 leads to an optimizer p{9) = 1(2,9) where I(X,9) = 
9/X — 1 — log(#/A), and so our procedure based on (3.6) is defined by 



p(9) = inf I(X,9) and q(X) = 1 




and 





log A — log 9 



X-9 



-Xi)>a 



I 



A commonly used procedure in the change-point literature is the gener- 
alized likelihood ratio procedure which specifies the nominal value #0 (of the 
parameter of the pre-change distribution); see [14] and [29]. The procedure 
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is defined by the stopping time 



t(9q, a) = inf < n > 1 : max sup log ^ A ^~! > a > 
( l<fc<n \(z\ ~^ Je \. x i) J 

= inf < n > 1 : max sup Y^ Aog ^ — (A — 9q)X{ ) > a \ . 



l<fc<n 2 <A<3 i=fc V #0 

Note that r(9o,a) can be thought of as our procedure T*{a) whose con- 
tains the single point #o- The choice of #o can be made directly by considering 
the pre-change distribution which is closest to the post-change distributions 
because it is always more difficult to detect a smaller change. For our exam- 
ple, 6» = 1. 

An effective method to implement t(9q, a) numerically can be found in [14]. 
Similarly, we can implement T*(a) as follows. Compute V n recursively by 
V n = max(K_i + log(2/0.8) - (2 - 0.8)X n ,0). Whenever V n = 0, one can 
begin a new cycle, discarding all previous observations and starting afresh 
on the incoming observations, because for all 0.8 < 9 < 1, 2 < A < 3 and 
1 < k < n, Er=fc((log A - log 9)/{X -9)-X l )<0 since (log A - log 9)/(X - 9) 
is maximized at (9, A) = (0.8, 2). Now each time a new cycle begins compute 
at each stage n = 1, 2, . . . 

Q^=X n -\ hI„-Hi, k = l,...,n. 

Then the procedure T*(a) = first n such that < c& for some k, where 



Cfc = inf sup 

O.8<0<1 2<A <3 



k 

log A — log 9 p(9)a 
X-9 X-9 



To further speed up the implementation, compute W n recursively by W n = 
max(VK n _i + log 2 — X n , 0). Stop whenever W n > p(0.8)a/1.2. Continue tak- 
ing new observations (i.e., do not stop) whenever W n < p(l)a/2. If p(l)a/2 < 
W n < p(0.8)a/1.2, then we will also stop at time n if < for some k. 
The reasons behind this implementation are given below. 

First, if at time no we have W nQ > p(0.8)a/1.2 > 0, then there exists some 
k such that £™4 (log2 - AQ) > p(0.8)a/1.2. Thus for all 9 G [0.8,1] and 
A = 2, 

Xo-9 ^^ logAp-logg y U V9f n 9 Y , 

^ X -9 p(0.8)a . 

> — — — • > a. 

~ p{9) 1.2 ~ 



DETECTION WITH COMPOSITE PRE-CHANGE 



25 



Table 2 

Comparison of two procedures in change-point problems with 
composite pre-change and composite post-change hypotheses 





a 


f*(a) 


r(l,a) 


22.50 


5.02 




6 = 1 


601 ±18 


606 ±19 


EgN 


6 = 0.9 


1448 ± 43 


1207 ±36 




6 = 0.8 


3772 ±116 


2749 ± 90 




\ = 2 


21.41 ±0.10 


21.92±0.11 




X = 2.2 


18.09 ±0.07 


18.18±0.09 


E X N 


X = 2.5 


15.08 ±0.05 


14.76 ±0.06 




X = 2.7 


13.75 ±0.04 


13.22 ±0.05 




\ = 3 


12.29 ±0.04 


11.62 ±0.04 



Hence, T*(a) will stop at time no- Second, T*(a) will never stop at time n 
when W n < p(l)a/2 because for 9\ = 1, all 2 < A < 3, and all k, 

X-9 X " /logA-bg^i \ A-fli A vw A - iu/^ 

7^ x V ; a Xi)< , a \ >J lo g 2 ~ X i) < 7^ n W n < d- 



Table 2 provides a comparison of the performances for our procedure 
T*(a) with those of t{0q,o). The threshold a for each of these two proce- 
dures is determined from the criterion ~Eig = iN(a) ~ 600. The results in Table 
2 are based on 1000 simulations for ~EqN and 10,000 simulations for Ea-/V. 
Note that for these two procedures, the detection delay E^iV = E^iV. Table 
2 shows that at a small additional cost of detection delay, T*(a) can signif- 
icantly improve the mean times between false alarms compared to r(l,a). 
This is consistent with the asymptotic theory in this section. 

4. Normal distributions. Our general theory in Section 3 assumes that 
O and A are compact. If they are not compact, then our proposed procedures 
may or may not be asymptotically optimal. However, we can still sometimes 
apply our ideas in these situations, as shown in the following example. 

Suppose we want to detect a change from negative to positive in the mean 
of independent normally distributed random variables with variance 1. In 
the context of open-ended hypothesis testing, we want to test 

H :9 G9 = (-oo,0) against H\ : X G A = (0, oo). 

Let us examine the procedures T(a) defined in (3.6) for different choices of 
optimizer pairs. 

First, let us assume qo(X) = X 1 ^ with (3 > 1/2; then we have an optimizer 
pair 

p(e)=k /3 \9\ 2 - {1/l3) and q(X)=X 1/ P with kp = 2(3 2 (2/3 - l)(V0)-2 
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(assume 0° = 1), and thus the procedure defined in (3.6) becomes tp(a) 
first time n such that 

c r 1 -~ A2 * 2 

mi sup 



0<°A>o 



p(0) 



^(A - 0)i? n nj >a where S n = Y^ Xi ~ 



Letting 6 — > gives us that S n > if = n, and rewriting the stopping 
rule as 



inf sup 

0<Oa>o 



A - — ) +(^-e) --p(9)a 
n I V n J n 



> 0. 



The supremum is attained at A = S n /n, and so tp(a) = first time n such 
that for all 9 < 0, 



— >9 + J- P (9)a. 
n V n 

A routine calculation leads to 

tp(a) = inf{n > 1 : S n > a^n 1 ' 13 }. 
This suggests using a stopping time of the form 

(4.1) t*Ja) = inf/n > 1 : max [(5„ - S k )(n - kf~ l ] > a 13 } 

to detect a change in mean from negative to positive. Observe that for 
(3 = 1, ig(a) is just the one-sided SPRT and tp(a) is just a special form of 
Page's CUSUM procedures. For /3 = 1/2, i/?(a) and i«(a) have also been 
studied extensively in the literature, since they are based on the generalized 
likelihood ratio. Different motivation to obtain these two procedures can be 
found for 1 13(a) in Chapter IV of [28], which is from the viewpoint of the 
repeated significant test, and for t^(a) in [29], which is from the viewpoint 

of the generalized likelihood ratio. For tp(a) with < (5 < 1, see [3] and 
equation (9.2) on page 188 in [28]. 
Next, qo(X) = 1 leads to 

p(9) = °- and 9 (A) = 1 

and 

to(a) = inf{n > a : S n > 0}. 

Hence we use the following stopping time to detect a change in mean from 
negative to positive: 

(4.2) i* (a) = inf ln> a: max (S n - S k ) > 1, 

I 0<k<n—a J 
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where the maximum is taken over < k < n — a. It is interesting to see 
that to(a) and ^(a) can be thought of as the limits of tp(a) and t*g(a), 
respectively, as (3 —> oo. 

Though one cannot use our theorems directly to analyze the properties of 
£g(a) and t^(a), they are indeed asymptotically optimal to first order. For 
P> 1/2, first note that 

p(8)q(X) = I(\,8) if = -(2/3-l)A. 
By nonlinear renewal theory ([28], Chapters 9 and 10), 

E A t£(a)~a/g(A). 
Equation (13) on page 636 in [15] shows that for any 9 < 0, 

P e (^(a) < oo) < exp(-(l + o(l))p(9)a), 

and so Lemma 2.1 implies logEgi^(a) ~ p(9)a as a — ► oo. Thus £«(«) is 
asymptotically efficient at (0, A) with 9 = —(2/3 — 1)A, and hence ig(a) is 
asymptotically optimal to first order. Similarly, the asymptotic optimality 
property of tg( a ) can be proved directly since the structure of to(a) is very 
simple. 

Remark. The above arguments establish the following optimality prop- 
erties of tp(a) and t*p(a). Suppose we want to test 

H 0>s :6 = -(2(3 -1)5 against H 1>S :X = 6, 

where > 1/2 is given but 5 > is unknown. Then tg(a) is an asymptotically 
optimal solution for all 5 > 0, while t%(a) is asymptotically optimal in the 
problems of detecting a change from i^o,<5 to H\ s for all 5 > 0. As far as we 
know, no optimality properties of tp(a) and tp(a) have been studied except 
for the special case of (3 = 1/2 or 1. Even for the case (3 = 1/2 which was 
studied in [29], our method is simpler and more instructive. 

5. Proof of Theorem 2.1. The basic idea in proving Theorem 2.1 is to 
relate the stopping time M(a) in (2.4) to new stopping times defined by 

(5.1) M e (a) = inf in >l:j^\og f -f^\ - I(\ 9)a > o) . 

The proof of Theorem 2.1 is based on the following lemmas. 

Lemma 5.1. For all 9 G [6 , 9i], 

P e (M(a) < oo) < P e (M e (a) < oo) < exp(-I(X,9)a), 
and hence (2.7) holds. 
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Proof. The first inequality follows at once from the fact that M{a) > 
Mq(o) for all 9 G [#cb#i]> and the second inequality is a direct application of 
Wald's likelihood ratio identity. □ 

We now derive approximations for E A M(a). Similarly to (2.6), Mq{cl) in 

(5.1) can be written as 

M e (a) = inf{n > 1 : S n > b'(X)a + (n - a)<j>(6)}. 

As we said earlier, the supremum in (2.6) is attained at 9 = 9q if n < a, and 
at 9 = 9\ if n > a, so that 

(5.2) {M (a) =m} = {M(a) = Mq (a) = m} for all m < a. 

For simplicity, we omit a and 9, writing M = M(a) and M/~ = Mg,(a) for 
A; = 0,1. 

Lemma 5.2. As a ^oo, 

E x M(a) = a + f^^E A (a - M ; M < a) + O(l). 
6'(A) -0(6»i) 

PROOF. Observe that 

E A M = a - E A (a - M; M < a) + E A (M — a; M > a), 

and by (5.2), E A (M - o; M < a) = E A (M - a; M < a). Thus it suffices to 
show that 

(5.3) E A (M -a;M>a) = ^^~^ E A (a - M ; M < a) + 0(1). 

& (A) - </>(0i ) 

To prove this, define a stopping time 

JV fc (u) = inf |n > 1 : fj(Xi - 0(0 fc )) > u|, 

for k = 0, 1 and any u > 0. Assume a is an integer. (For general a, using [a], 
the largest integer < a, permits one to carry through the following argument 
with minor modifications.) By (5.2) we have 

E x (M-a\M>a)= / E A (M - a\S a - b'(X)a = x, M > a) 

J — oo 

x P A (S' a - b'(\)a e dx\M > a). 
Conditional on the event {S a — b'(X)a = x, Mq > a}, 

M-a = mi{m : X a+1 + ■■■ + X a+m + S a > b'(X)a + m^)} 

= inf jm :f2(X a+i - 0(^)) >b'(X)a-S a = -x^, 
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which is equivalent to Ni(—x) since X±, -X~2,.. . are independent and identi- 
cally distributed. Thus 

E A (M - a\M > a) 

= [ E x N 1 (-x)P x (S a -b'(X)aedx\M Q >a). 

J — oo 



(5.4) 



Similarly, 
(5.5) 



E X (M - a\M > a) 

= f E x N (-x)P x (S a -b'(X)aedx\M >a). 



Now for k = 0,1 and any u > 0, define 

N k (u) 

Rk(u)= J2 {Xi-4>{e k ))-u. 



i=l 

Then, by Theorem 1 in [13], 

supE x R k (u) < E A (Xi - 0(6 k )) 2 /(b'(X) - <f,(B k )) < oo. 

u>0 

By Wald's equation, (b'(X) - (p(8 k ))E x N k (u) = u + E x R k (u), so that 

u 

< oo 



sup E A (N k (u) 



u>0 



v(\) - <f>(e k ) 



for k = 0, 1. Hence, we have 



sup 

M>0 



E x Ni(u) 



b'(\) 



-E x N Q (u) 



< oo. 



b'(X) - 

Plugging into (5.4), and comparing with (5.5), we have 

E A (M - a\M >a) = ?ff ~^°? E A (M - a|Af > a) + 0(1). 

Relation (5.3) follows at once from the fact that {M > a} = {Mq > a} and 
the fact that E A (Mo — a) = 0(1). Hence, the lemma holds. □ 

Lemma 5.3. Suppose Y\, Y2, . . . are independent and identically distributed 
with mean \i > and finite variance a 2 . Define 



N a = mi\n>l:J2Yi> 



i=l 



Then as a ^ 00, 
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Proof. The lemma follows at once from the well-known facts that as 
a — ► oo, 

2 

E(iV o ) = - + 0(l) and Var(iV a ) = (1 + o(l))^, 

and that 

Ng-a/fj, 

is asymptotically standard normal. See page 372 in [4], equation (5) in [27] 
and Theorem 8.34 in [28]. □ 

Proof of Theorem 2.1. Relation (2.7) is proved in Lemma 5.1. By 
(5.1) Mq = Mg (a) can be written as 

*-^«>l:g^*£|g>.}. 
By Lemma 5.3 it is easy to show that 




where cr = y/WJJ) / (b' (X) - (p(0 Q )). Thus relation (2.8) holds by Lemma 5.2 
and the definition of <f>(0) in (2.5). □ 
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